Monocular Depth Estimation Network Based on Swin Transformer
نویسندگان
چکیده
Abstract Estimating depth from a single image is challenging because 2D may correspond to many different 3D scenes with the same depth. While most deep learning based prediction methods extract features using small convolutional kernels receptive fields, which results in deformed edges and inaccurate values of distant objects estimation results. Aiming at this problem, we propose network on Swin Transformer encoder-decoder structure. We construct encoder network, can encode long-range spatial dependency various scales across channels. The decoder proposed charge fusing by operations interpolation, concatenation, convolution. Experiments KITTI NYUv2 datasets show that our get more accurate than state-of-the-art methods.
منابع مشابه
Monocular Vision Based Relative Depth Estimation for Hand Gesture Recognition
In this paper, we focus on real-time relative hand depth estimation with cluttered backgrounds and variable illumination for a target application to hand pushing and pulling detection. The task is characterized by a lack of consistent internal contrast in the hand combined with the complex background. We adopt a detection-and-registration strategy to predict frame-toframe scaling factor and mak...
متن کاملTowards Domain Independence for Learning-based Monocular Depth Estimation
Modern autonomous mobile robots require a strong understanding of their surroundings in order to safely operate in cluttered and dynamic environments. Monocular depth estimation offers a geometry-independent paradigm to detect free, navigable space with minimum space and power consumption. These represent highly desirable features, especially for micro aerial vehicles. In order to guarantee rob...
متن کاملDepth Estimation Using Monocular and Stereo Cues
Depth estimation in computer vision and robotics is most commonly done via stereo vision (stereopsis), in which images from two cameras are used to triangulate and estimate distances. However, there are also numerous monocular visual cues— such as texture variations and gradients, defocus, color/haze, etc.—that have heretofore been little exploited in such systems. Some of these cues apply even...
متن کاملBayesian depth estimation from monocular natural images.
Estimating an accurate and naturalistic dense depth map from a single monocular photographic image is a difficult problem. Nevertheless, human observers have little difficulty understanding the depth structure implied by photographs. Two-dimensional (2D) images of the real-world environment contain significant statistical information regarding the three-dimensional (3D) structure of the world t...
متن کاملAperture Supervision for Monocular Depth Estimation
We present a novel method to train machine learning algorithms to estimate scene depths from a single image, by using the information provided by a camera’s aperture as supervision. Prior works use a depth sensor’s outputs or images of the same scene from alternate viewpoints as supervision, while our method instead uses images from the same viewpoint taken with a varying camera aperture. To en...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of physics
سال: 2023
ISSN: ['0022-3700', '1747-3721', '0368-3508', '1747-3713']
DOI: https://doi.org/10.1088/1742-6596/2428/1/012019